15 research outputs found
Continuum limit of total variation on point clouds
We consider point clouds obtained as random samples of a measure on a
Euclidean domain. A graph representing the point cloud is obtained by assigning
weights to edges based on the distance between the points they connect. Our
goal is to develop mathematical tools needed to study the consistency, as the
number of available data points increases, of graph-based machine learning
algorithms for tasks such as clustering. In particular, we study when is the
cut capacity, and more generally total variation, on these graphs a good
approximation of the perimeter (total variation) in the continuum setting. We
address this question in the setting of -convergence. We obtain almost
optimal conditions on the scaling, as number of points increases, of the size
of the neighborhood over which the points are connected by an edge for the
-convergence to hold. Taking the limit is enabled by a transportation
based metric which allows to suitably compare functionals defined on different
point clouds
Geometric structure of graph Laplacian embeddings
We analyze the spectral clustering procedure for identifying coarse structure in a data set xâ,âŠ,x_n, and in particular study the geometry of graph Laplacian embeddings which form the basis for spectral clustering algorithms. More precisely, we assume that the data is sampled from a mixture model supported on a manifold M embedded in R^d, and pick a connectivity length-scale Δ>0 to construct a kernelized graph Laplacian. We introduce a notion of a well-separated mixture model which only depends on the model itself, and prove that when the model is well separated, with high probability the embedded data set concentrates on cones that are centered around orthogonal vectors. Our results are meaningful in the regime where Δ=Δ(n) is allowed to decay to zero at a slow enough rate as the number of data points grows. This rate depends on the intrinsic dimension of the manifold on which the data is supported
It begins with a boundary: A geometric view on probabilistically robust learning
Although deep neural networks have achieved super-human performance on many
classification tasks, they often exhibit a worrying lack of robustness towards
adversarially generated examples. Thus, considerable effort has been invested
into reformulating Empirical Risk Minimization (ERM) into an adversarially
robust framework. Recently, attention has shifted towards approaches which
interpolate between the robustness offered by adversarial training and the
higher clean accuracy and faster training times of ERM. In this paper, we take
a fresh and geometric view on one such method -- Probabilistically Robust
Learning (PRL) (Robey et al., ICML, 2022). We propose a geometric framework for
understanding PRL, which allows us to identify a subtle flaw in its original
formulation and to introduce a family of probabilistic nonlocal perimeter
functionals to address this. We prove existence of solutions using novel
relaxation methods and study properties as well as local limits of the
introduced perimeters
On the regularized risk of distributionally robust learning over deep neural networks
In this paper, we explore the relation between distributionally robust learning and different forms of regularization to enforce robustness of deep neural networks. In particular, starting from a concrete min-max distributionally robust problem, and using tools from optimal transport theory, we derive first-order and second-order approximations to the distributionally robust problem in terms of appropriate regularized risk minimization problems. In the context of deep ResNet models, we identify the structure of the resulting regularization problems as mean-field optimal control problems where the number and dimension of state variables are within a dimension-free factor of the dimension of the original unrobust problem. Using the Pontryagin maximum principles associated with these problems, we motivate a family of scalable algorithms for the training of robust neural networks. Our analysis recovers some results and algorithms known in the literature (in settings explained throughout the paper) and provides many other theoretical and algorithmic insights that to our knowledge are novel. In our analysis, we employ tools that we deem useful for a future analysis of more general adversarial learning problems
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the KolmogorovâSmirnov test, probability or quantile (PP/QQ) plots and receiver operating characteristic or ordinal dominance (ROC/ODC) curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testingâs classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others